Search CORE

75 research outputs found

Completing Low-Rank Matrices with Corrupted Samples from Few Coefficients in General Basis

Author: Lin Zhouchen
Zhang Chao
Zhang Hongyang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Subspace recovery from corrupted and missing data is crucial for various applications in signal processing and information theory. To complete missing values and detect column corruptions, existing robust Matrix Completion (MC) methods mostly concentrate on recovering a low-rank matrix from few corrupted coefficients w.r.t. standard basis, which, however, does not apply to more general basis, e.g., Fourier basis. In this paper, we prove that the range space of an

m\times n

matrix with rank

r

can be exactly recovered from few coefficients w.r.t. general basis, though

r

and the number of corrupted samples are both as high as

O(\min\{m,n\}/\log^3 (m+n))

. Our model covers previous ones as special cases, and robust MC can recover the intrinsic matrix with a higher rank. Moreover, we suggest a universal choice of the regularization parameter, which is

\lambda=1/\sqrt{\log n}

. By our

\ell_{2,1}

filtering algorithm, which has theoretical guarantees, we can further reduce the computational cost of our model. As an application, we also find that the solutions to extended robust Low-Rank Representation and to our extended robust MC are mutually expressible, so both our theory and algorithm can be applied to the subspace clustering problem with missing values under certain conditions. Experiments verify our theories.Comment: To appear in IEEE Transactions on Information Theor

arXiv.org e-Print Archive

High-quality Image Restoration from Partial Mixed Adaptive-Random Measurements

Author: Chao Hongyang
Jin Zhu
Sha Wei E. I.
Yang Jun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

A novel framework to construct an efficient sensing (measurement) matrix, called mixed adaptive-random (MAR) matrix, is introduced for directly acquiring a compressed image representation. The mixed sampling (sensing) procedure hybridizes adaptive edge measurements extracted from a low-resolution image with uniform random measurements predefined for the high-resolution image to be recovered. The mixed sensing matrix seamlessly captures important information of an image, and meanwhile approximately satisfies the restricted isometry property. To recover the high-resolution image from MAR measurements, the total variation algorithm based on the compressive sensing theory is employed for solving the Lagrangian regularization problem. Both peak signal-to-noise ratio and structural similarity results demonstrate the MAR sensing framework shows much better recovery performance than the completely random sensing one. The work is particularly helpful for high-performance and lost-cost data acquisition.Comment: 16 pages, 8 figure

arXiv.org e-Print Archive

HKU Scholars Hub

Face Recognition from Sequential Sparse 3D Data via Deep Registration

Author: Chao Hongyang
Ding Shengyong
Lin Hongxin
Tan Yang
Xiao Zelin
Publication venue
Publication date: 04/04/2019
Field of study

Previous works have shown that face recognition with high accurate 3D data is more reliable and insensitive to pose and illumination variations. Recently, low-cost and portable 3D acquisition techniques like ToF(Time of Flight) and DoE based structured light systems enable us to access 3D data easily, e.g., via a mobile phone. However, such devices only provide sparse(limited speckles in structured light system) and noisy 3D data which can not support face recognition directly. In this paper, we aim at achieving high-performance face recognition for devices equipped with such modules which is very meaningful in practice as such devices will be very popular. We propose a framework to perform face recognition by fusing a sequence of low-quality 3D data. As 3D data are sparse and noisy which can not be well handled by conventional methods like the ICP algorithm, we design a PointNet-like Deep Registration Network(DRNet) which works with ordered 3D point coordinates while preserving the ability of mining local structures via convolution. Meanwhile we develop a novel loss function to optimize our DRNet based on the quaternion expression which obviously outperforms other widely used functions. For face recognition, we design a deep convolutional network which takes the fused 3D depth-map as input based on AMSoftmax model. Experiments show that our DRNet can achieve rotation error 0.95{\deg} and translation error 0.28mm for registration. The face recognition on fused data also achieves rank-1 accuracy 99.2% , FAR-0.001 97.5% on Bosphorus dataset which is comparable with state-of-the-art high-quality data based recognition performance.Comment: To be appeared in ICB201

arXiv.org e-Print Archive

Crossref

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

Author: Chao Hongyang
Chen Jingwen
Li Yehao
Mei Tao
Pan Yingwei
Yao Ting
Publication venue
Publication date: 03/05/2019
Field of study

It is well believed that video captioning is a fundamental but challenging task in both computer vision and artificial intelligence fields. The prevalent approach is to map an input video to a variable-length output sentence in a sequence to sequence manner via Recurrent Neural Network (RNN). Nevertheless, the training of RNN still suffers to some degree from vanishing/exploding gradient problem, making the optimization difficult. Moreover, the inherently recurrent dependency in RNN prevents parallelization within a sequence during training and therefore limits the computations. In this paper, we present a novel design --- Temporal Deformable Convolutional Encoder-Decoder Networks (dubbed as TDConvED) that fully employ convolutions in both encoder and decoder networks for video captioning. Technically, we exploit convolutional block structures that compute intermediate states of a fixed number of inputs and stack several blocks to capture long-term relationships. The structure in encoder is further equipped with temporal deformable convolution to enable free-form deformation of temporal sampling. Our model also capitalizes on temporal attention mechanism for sentence generation. Extensive experiments are conducted on both MSVD and MSR-VTT video captioning datasets, and superior results are reported when comparing to conventional RNN-based encoder-decoder techniques. More remarkably, TDConvED increases CIDEr-D performance from 58.8% to 67.2% on MSVD.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications